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SUMMARY 

The  research  is  devoted  to  modifying  the  Bayesian  techniques 
associated  with  determining  the  minimum  sample  size  required  to  con- 
struct interval  estimates  of  the  true  mean  of  an  experimental  or  sampling 
process  which  is  modeled  by  a normal  distribution  with  unknown  parameters . 
The  procedure  considers  only  the  case  where  the  prior  information  can  be 
represented  by  a normal  distribution  with  known  mean  and  known  variance. 

Rigorous  Bayesian  analysis  of  this  situation  would  result  in 
using  a posterior  distribution  which  has  a normal-gamma  density  to  con- 
struct interval  estimates.  In  order  to  circumvent  the  obvious  diffi- 
culties of  working  with  this  rather  complex  density,  a procedure,  which 
5 s felt  to  be  more  compatible  to  the  U.  S.  Army  Operational  Testing 
environment,  if  offered  for  approx imating  the  required  Bayesian  sample 
size. 

2 

If  the  variance  of  the  sampling  or  experimental  process,  & , were 
known,  the  minimum  Bayesian  sample  size  required  to  construct  an  interval 
estimate  about  the  true  mean,  y,  with  confidence  coefficient,  a,  and 
width,  k,  is: 


where  Z refers  to  the  percentage  points  of  the  standard  normal  dis- 

2 

tribution  such  that  P(Z  > Z ) = q/2,  and  a’  is  the  variance  of  the 

O'/  2 

prior  distribution  or,  the  prior  variance  of  the  true  sampling  mean,  jj, . 


vi 
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Substituting  the  sample  variance,  S , for  the  true  process 

\ 

variance  and  the  term  t /0,  n*  - 1; for  Z /o  in  the  above  expression, 

a/2  c ' a/2 

nfri  at  finding  the  width  of  the  confidence  interval  as  a function  of  the 
sample  variance,  i.e.,  k = 6S,  results  in  the  following  expression  for 
the  approximate  Bayesian  sample  size: 

r 2 t(«/2,  n»c-l)  2 s£ 

n\  - L * J - ~fi. 

where  n*  is  the  classical  sample  size  required  tc  construct  an  interval 
c 

estimate  of  width  k about  the  true  mean,  p,,  of  the  sampling  process. 

The  term  t(a/2,  n*c  - l)  refers  to  the  percentage  points  of  the  Student's 
t distribution  with  n*c  - 1 degrees  of  freedom  such  that  P[  (t  > t(a/2, 
n*c  - 1)]  = a/Z. 

The  expression  for  the  approximate  Bayesian  sample  size  is  solved 
iteratively,  starting  with  a fraction  of  the  classical  sample  size 
required  for  the  interval  estimate  of  the  same  specified  confidence 
and  accuracy,  as  the  first  approximation.  The  iterative  procedure  is 
programed  for  a UNIVAC  1108  computer  and  applied  to  a hypothetical 
example  to  demonstrate  the  effectiveness  of  the  methodology. 

If  accurate  prior  information  is  available,  the  results  achieved 
by  the  procedure  developed  to  approximate  the  Bayesian  sample  size  end 
construct  interval  estimates  of  the  unknown  sampling  mean,  u>  °f 
specified  confidence  and  accuracy  are  comparable  to  the  results  obtained 
using  classical  techniques.  These  results,  however,  are  achieved  using 
smaller  samples  sizes  than  required  for  the  classical  case.  A pro- 


ccduro  was  suggested  for  examining  the  accuracy  of  the  prior  information 
and  ascertaining  whether  or  not  Bayesian  analysis  was  appropriate  for 
a given  sampling  or  experimental  situation.  However,  the  expected 
results  of  using  this  procedure  were  not  obtained. 


CHAPTER  I 


INTRODUCTION 
The  General  Problem 

This  study  is  an  investigation  of  the  problem  of  determining 
the  minimum  sample  size  of  an  experiment,  that  is,  the  minimum  number 
of  replications  of  the  experiment,  required  to  estimate  the  mean  of  the 
experimental  variable  to  within  a predetermined  accuracy.  In  general, 
the  strdy  is  limited  to  a certain  type  of  testing  situation  which  has 
the  following  characteristics: 

a.  The  test  variable,  whose  mean  is  to  be  estimated,  can  be 
modeled  by  a normal  distribution  with  unknown  mean  and  unknown  variance 

b.  Information  is  available,  prior  to  sampling  or  experimenting 
from  which  a probability  distribution  of  the  mean  of  the  test  variable 
can  be  constructed. 

c.  This  prior  information  can  be  represented  by  a normal  dis- 

2 

tribution  with  known  mean,  m',  and  known  variance,  o'  . 

The  Specific  Problem 

A U.  S.  Army  Operational  Test  (OT)  is  an  overall  evaluation  of 
a system  which  has  been  developed  for  general  use  within  the  U.  S.  Army 
structure  (2).  In  this  'context,  a "system"  may  be  not  only  hardware, 
but  also  doctrinal  concepts,  and  is  usually  a mixture  of  both.  The 
test  is  conducted  in  an  environment  which  duplicates  or  closely 
simulates  those  conditions  under  which  the  system  will  be  employed  if 


2 


it  is  adopted  for  general  use.  It  is  this  fact  that  generally  differen- 
tiates OT  from  engineering,  developmental,  pre-production,  and  other 
tests  which  may  be  conducted  on  the  same  system  and  which  sire  usually 
a part  of  the  total  development  scheme  of  the  system.  In  fact,  OT  is 
required  to  be  an  independent  evaluation  of  the  system.  Thus,  OT  is  a 
vital  part  of  the  process  by  which  new  equipment  and  concepts  are 
incorporated  into  the  U.  S.  Army  structure. 

An  Operational  Test  is  in  essence  a systematic  plan  for  evaluating 
the  total  system  being  tested.  It  is  composed  of  numerous  subtests 
which  address  specific  issues  (unknown  parameters)  which  are  considered 
critical  or  paramount  to  the  total  evaluation  of  the  system  (3).  The 
specific  critical  issues  to  be  evaluated  by  each  subtest  and  the  order 
of  these  tests  govern  the  overall  structure  of  the  Operational  Test. 

Once  the  specific  structure  of  the  Operational  Test  has  been 
established,  a decision  must  be  made  as  to  the  number  of  replications 
of  each  subtest  to  conduct  in  order  to  properly  evaluate  the  critical 
issue  in  question.  Time  and  budget  constraints  place  emphasis  on  con- 
ducting the  minimum  number  of  replications  possible;  while  the  disastrous 
consequences  that  could  result  if  a critical  issue  is  not  properly 
evaluated,  make  it  imperative  that  accuracy  is  not  sacrificed  for 
economy.  Thus,  the  problem  reduces  down  to  one  of  determining  the 
minimum  timber  of  replications  of  each  subtest  to  conduct  in  order  to 
evaluate  the  critical  issues  in  question  to  within  a predetermined 
accuracy. 

Current  procedural  and  policy  documents  governing  the  conduct 
of  Operational  Tests  (3,  4,  11 ) suggest  that  for  the  most  part,  sample 
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sizes  are  determined  using  non-Bayesian  or  classical  statistical  methods 
These  methods  do  not  consider  prior  information  available  concerning 
the  variable  being  tested;  therefore,  inferences  and  decisions  about 
the  variable  are  based  entire!  ' on  the  experimental  or  sampling  results. 
Bayesian  techniques,  on  tn~  other  hand,  attempt  to  use  both  the  prior 
information  and  the  experimental  results  in  making  inferences  and 
decisions  about  the  variable.  Thus,  this  investigation  is  essentially 
a search  for  a practical  procedure  for  applying  Bayesian  techniques  to 
Operational  Testing.  The  principal  Operations  Research  tools  used  in 
this  study  are  statistical  inference  and  estimation  techniques  to 
develop  the  methodology,  and  computer  simulation  techniques  to  demon- 
strate the  procedures  developed. 

Operational  Testing  is  an  expensive  undertakii  g which  must 
operate  in  an  environment  constrained  by  budget  and  time  considerations. 
The  author  believes  that  a methodology  which  effectively  reduces  the 
number  of  replications  required  to  evaluate  the  critical  issues 
addressed  by  each  subtest  and  also  maintains  the  accuracy  and  confidence 
desired  of  the  test,  is  a worthwile  pursuit  directly  applicable  to  the 
Operational  Testing  environment. 

Background 

During  the  last  decade,  there  has  been  an  increasing  emphasis 
and  drive  within  the  military  community  to  develop  and  formalize  a 
methodology  to  adequately  identify  and  evaluate  the  risks  associated 
with  the  development  and  procurement  of  major  weapons  systems.  The 
underlying  premise  which  initiated  this  action  was  that  unanticipated 
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co3t  and  time  over-runs  and  performance  shortcomings , which  had  become 
increasingly  prevalent,  were  the  result  of  inadequate  assessment  of 
the  risks  involved  with  the  materiel  acquisition  process . The  metho- 
dology which  grew  out  of  this  effort  is  known  as  decision  risk  analysis 
In  a report  prepared  for  the  Army  Materiel  Systems  Analysis  Agency 
(AMSAA),  Atzinger,  Brooks,  et  al.,  (6)  present  a brief  history  and 
description  of  the  major  concepts  of  the  decision  risk  analysis  process 
The  authors  define  risk  analysis  as  follows:  "Decision  risk  analysis 

is  a discipline  of  systems  analysis,  which  in  a structured  manner,  pro- 
vides a meaninfgul  measure  of  the  risks  associated  with  various  alter- 
natives." The  purpose  of  the  report  is  to  structure  this  decision  risk 
analysis  process  so  that  the  trade-offs  inherent  in  the  alternatives 
are  visably  and  meaningfully  displayed.  It  cites  the  following  four 
major  areas  as  the  underlying  concepts  of  decision  risk  analysis: 

a.  Subjective  Probability 

b.  Monte  Carlo  Methods 

c.  Network  Analysis 

d.  Bayesian  Statistics 

Bayesian  statistics  and  Bayes  Theorem  have  attracted  renewed 
interest  in  many  fields  of  applied  and  theoretical  statistics  in  recent 
years.  This  theorem  is  essentially  a mechanism  for  combining  new 
information  with  previously  available  information  so  that  decisions  or 
inferences  can  be  based  on  all  the  information  available.  Over  the 
years,  a controversy  has  developed  between  the  Bayesian  and  the  more 
orthodox  classical,  statistical  concepts.  Anscombe  (l)  provides  a brief 
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but  concise  history  of  the  development  of  both  philosophies. 

During  the  last  few  years  there  has  been  a revival  of  interest 
among  statistical  theorists  in  a mode  of  argument  going  back  to  the 
Reverend  Thomas  Bayes^  (1702 -6l),  Presbyterian  minister  at  Tunbridge 
Wells  in  England,  who  wrote  an  "Essay  Towards  Solving  a Problem  in 
the  Doctrine  of  Chances,"  which  was  published  in  17&3  after  his 
death.  Bayes  work  was  incorporated  in  a great  development  of  pro- 
bability theory  by  Laplace  and  many  others,  which  had  general 
currency  right  into  the  early  years  of  the  century.  Since  then 
there  has  been  an  enormous  development  of  theoretical  statistics, 
by  R.  A.  Fisher,  J.  Neynan,  E.S.  Pearson,  A.  Wald  and  many  others, 
in  which  the  methods  anu  concepts  of  inference  used  by  Bayes  and 
Laplace  have  been  rejected. 

The  orthodox  statistician,  during  the  last  twenty-five  years  or 
so,  has  sought  to  handle  inference  problems  (problems  of  deciding 
what  the  figures  mean  and  what  ought  to  be  done  about  them)  with 
the  utmost  objectivity.  He  explains  his  favorite  concepts,  signi- 
ficance level,  confidence  coefficient,  unbiased  estimates,  etc.,  in 
terms  of  what  he  calls  probability,  but  his  notion  of  probability 
bears  little  resemblance  to  what  the  man  in  the  street  means  (rightly) 
by  probability.  He  is  not  concerned  with  probable  truth  or  plausi- 
bility, but  he  defined  probability  in  terms  of  frequency  of  occur- 
rence in  repeated  trials,  as  in  a game  of  chance.  He  views  his 
inference  problems  as  matters  of  routine,  and  tries  to  devise  pro- 
cedures that  will  work  well  in  the  long  run.  Elements  of  personal 
judgment  are  as  far  as  possible  to  be  excluded  from  statistical 
calculations.  Admittedly,  a statistician  has  to  be  able  to  exer- 
cise judgment,  but  he  should  be  discreet  about  it  and  at  all  costs 
keep  it  out  of  the  theory.  In  fact,  orthodox  statisticians  show 
a great  diversity  in  their  practice,  and  in  the  explanations  they 
give  for  their  practice;  and  so  the  above  remarks,  and  some  of  the 
following  ones,  are  no  better  than  crude  generalizations.  As  such, 
they  are,  I believe,  defensible.  (Perhaps  it  should  be  explicitly 
said  that  Fisher,  who  contributed  so  much  to  the  development  of  the 
orthodox  school,  nevertheless  holds  an  unorthodox  position  not  far 
removed  from  the  Bayesian;  and  that  some  other  orthodox  statisti- 
cians, notably  Wald  have  made  much  use  of  formal  Bayesian  methods, 
to  which  no  probabilistic  significance  is  attached.) 

The  revived  interest  in  Bayesian  inference  starts  with  another 
posthumous  essay  on  "Truth  and  Probability,"  by  F.  P.  Ramsey^  (1903- 
30),  who  conceived  of  a theory  of  consistent  behavior  by  a person 
faced  with  uncertainty.  Extensive  developments  were  made  by  B.  de 
Finette  and  (from  a rather  different  point  of  view)  by  J.  Jefferys . 

For  mathematical  statisticians  the  most  thorough  study  of  such  a 
theory  is  that  of  L.  J.  Savage3>^.  R.  Schlaifer5  has  persuasively 
illustrated  the  new  approach  by  reference  to  a variety  of  business 
and  industrial  problems.  Anyone  curious  to  obtain  some  insight 
into  the  Bayesian  method,  without  mathematical,  hardship,  cannot  do 
better  than  browse  in  Schlaifer's  book. 
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The  Bayesian  statistician  attempts  to  show  how  the  evidence  of 
observations  should  modify  previously  held  beliefs  in  the  formation 
of  rational  opinions,  and  how  on  the  basis  of  such  opinions  and  of 
value  judgments  a rational  choice  can  be  made  between  alternative 
available  actions.  For  him  probability  really  means  probability. 

He  is  concerned  with  judgments  in  the  face  of  uncertainty,  and  he 
tries  to  make  the  process  of  judgment  as  explicity  and  orderly  as 
possible. 

Atzinger,  Brooks,  et  al.,  (6)  obviously  consider  Bayesian 
statistical  procedures  to  have  great  potential  in  the  decision  risk 
analysis  process;  they  state: 

Bayesian  statistics  enjoys  a unique  position  in  risk  analysis. 
There  frequently  exist  situations  where  the  analysist  has  both  data 
and  expert  judgment  to  draw  upon  in  constructing  the  probability 
distribution  of  interest  in  the  consolidation  activity.  Bayesian 
statistics  provides  the  analyst  with  a tool  for  synthesizing  all 
of  this  information  into  one  probability  distribution  which  can 
then  be  used  to  directly  estimate  risks. 


Review  of  the  Literature 


The  statistical  literature  dealing  with  sample  size  determination 
is  quite  extensive,  particularly  in  the  area  of  classical  techniques. 


Bayes,  T. , Essay  Towards  Solving  a Problem  in  the  Doctrine  of  Chances , 
reprinted  with  bibliographical  note  by  G.  A.  Barnard,  Biometrika, 

45  (1958),  293-315. 

p 

Ramsey,  F.  P , The  Foundations  of  Mathematics,  London:  Rowt ledge  and 
Kegan  Paul,  19c  1. 

’ Savage,  L.  J. , The  Foundations  of  Statistics , New  York,  John  Wiley, 
1954. 

^ Savage,  L.  J. , Subjective  Probability  and  Statistical  Practice,  to 
be  published  in  a Mehtuen  Monograph. 

Schlaifer,  R. , Probability  and  Statistics  for  Business  Decisions: 

An  Introduction  to  Managerial  Economics  Tinder  Uncertainty,  New  York, 
McGraw-Hill,  1959* 
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Mace  (12)  provides  an  excellent  and  thorough  coverage  of  classical  pro- 
cedures for  determining  the  optimum  sample  size  of  a research  experiment. 
This  publication  is  applications  oriented  and  provides  procedures, 
formulas,  and  tables  for  determining  economical  sample  sizes  for  some 
forty  different  types  of  research  objectives.  Unfortunately,  the 
author  considers  only  one  rather  limited  application  of  Bayesian  tech- 
niques to  sample  size  determination.  The  limitation  in  this  particular 
example,  that  rhe  variance  of  the  sampling  process  must  be  known,  seems 
to  occur  cn  Lte  frequently  in  the  literature  of  Bayesian  techniques  for 
determing  minimum  sasqple  sizes. 

There  has  been  extensive  research  in  the  application  of  Bayesian 
techniques  to  reliability  engineering  and  quality  control.  White  (15) 
presents  a promising  methodology  for  periodic  reliability  assessment 
using  Bayesian  techniques  to  combine  analytical  predictions  with  limited 
test  results  to  obtain  greater  precision  in  the  reliability  estimate. 

The  main  limitation  of  this  paper  is  that  it  considers  only  the  gamma 
distribution  in  the  analysis.  Gilbreath  (8)  has  devised  sampling 
procedures  for  use  in  sequential  sampling  models  which  have  direct 
application  in  quality  control  and  in  economic  lot  size  determination. 
These  techniques,  however,  are  more  applicable  to  hypothesis  testing 
than  to  the  estimation  problem. 

Atzinger  and  Brooks  (5)  provide  on  excellent  comparison  of 
Bayesian  and  classical  decision  making  under  uncertainty  for  a class 
of  problems  where  the  decision  variable  is  the  Bernoulli  success  pro- 
bability, p.  If  the  outcome  of  any  particular  test  or  experiment  is 
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viewed  as  a success  or  failure,  the  resulting  data  classification  is 
characteristic  of  a Bernoulli  process.  The  authors  persuasively  argue 
that  historically,  one  of  the  major  objectives  in  test  and  evaluation 
processes  has  been  to  estimate  this  unknown  Bernoulli  success  parameter. 
Unfortunately,  such  an  analysis  does  not  address  the  actual  parameters 
of  the  sampling  or  experimental  process  itself. 

Winkler  (l6)  provides  a rather  detailed  and  complete  development 
and  treatment  of  Bayesian  applications  to  inference  and  decision  theory 
at  the  introductory  level.  Although  the  concepts  developed  in  this 
publication  are  very  thoroughly  covered,  the  scope  of  the  material  is 
rather  limited.  That  is,  only  two  specific  sampling  processes  are 
analyzed  in  detail:  the  sampling  process  modeled  by  the  Bernoulli 

distribution,  and  the  sampling  process  represented  by  the  normal  dis- 
t ibution  with  known  variance. 

Raiffa  and  Schlaifer  (13 ) provide  an  extensive  mathematical, 
development  of  Bayesian  technqiues  applied  to  statistical  decision 
theory.  However,  once  again,  extensive  analysis  of  the  normal  dis- 
tribution is  generally  restricted  to  the  case  where  the  variance  of 
the  sampling  population  is  known. 

Thus,  Bayesian  applications  to  the  problem  of  sample-size  deter- 
mination deal  only  with  very  specialized  situations  in  the  current 
literature.  There  appears  to  be  no  substantial  research  into  the 
examination  of  the  general  problem.  On  the  other  hand,  classical 
statistical  techniques  commonly  apply  iterative  type  algorithms  to  the 
to  the  general  problem  of  sample  size  determination.  The  author  believes 
that  these  techniques  can  be  validly  extended  to  Bayesian  analysis  and 
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produce  equally  valid  results.  The  aim  of  this  investigation,  then,  is 
to  extend  the  application  of  these  well  known  techniques  to  the  general 
sampling  situation  using  Bayesian  analysis. 


CHAJPTER  II 


THE  TEST  METHODOLOGY 


The  Assumptions  of  Normality 

The  nommility  assumptions  stated  in  the  introduction  introduction 
are  crucial,  albeit  restrictive,  to  this  investigation.  The  assumption 
that  the  prior  distribution,  which  represents  the  distribution  of  the 
mean  of  a random  variable,  is  normally  distributed  has  solid  support 
in  the  Central  Limit  Theorem.  Hines  and  Montgomery  (9)  state  the 


essence  of  this  important  theorem  as  follows: 


If  xr  V.  . . 

variables  with  E(X. 
+ *2  + ‘ • ' + Xn>1 


, Xn  is  a sequence  of  n independent  random 
) = and  V(X^)  = ^ (both  finite)  and  Y = X 
then  under  some  general  conditions 
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has  an  approximate  N(0,l)  distribution  as  n approaches  infinity. 

The  "general  conditions"  mentioned  in  the  theorem  are  informally 
summarized  as  follows:  The  terms  X^,  taken  individually,  contri- 

bute a negligible  amount  to  the  variance  of  the  sum,  and  it  is  not 
likely  that  a single  term  makes  a large  contribution  to  the  sum. 

The  principal  implication  of  this  theorem,  then,  is  that  in 


general  the  sum  of  n independent  random  variables  is  approximately 
normally  distributed  for  sufficiently  large  n,  regardless  of  the  dis- 
tribution of  the  n individual  random  variables . Unfortunately,  the 


□ 


assumption  that  the  random  variable  to  be  tested  is  normally  distributed, 
is  much  more  restrictive.  However,  in  many  cases,  real-world  situations 
can  be  satisfactorily  approximated  by  a normal  process.  Also,  statisti- 
cal inference  and  estimation  procedures , particularly  those  concerning 
the  mean  of  random  variables,  are  generally  robust  (insensitive)  to  the 
normality  assumption  (l 2). 

The  Prior  Information 

At  first  glance,  the  requirement  that  Operational  Testing  be 
independent  of  other  testing  conducted  on  the  same  system  may  seen  an 
insurmountable  obstacle  in  attempting  to  obtain  adequate  prior  infor- 
mation. This,  however,  is  usually  not  the  case;  other  sources  of  prior 
information  do  exist.  For  example,  most  new  systems  undergoing  testing 
have  been  specifically  designed  to  replace  older  or  outmoded  systems 
which  are  currently  a part  of  the  U.  S.  Army  structure.  These  older 
systems  represent  a vast  source  of  historical  data  from  which  prior 
distributions  for  nearly  any  critical  issue  can  be  developed.  In 
those  rare  cases  where  no  historical  data  exist  from  which  to  construct 
a prior  distribution  for  a specific  critical  issue,  the  Delphi  technique 
or  other  proven  methods  of  developing  subjective  assessments  of  uncer- 
tainties can  be  used  to  develop  the  prior  distribution  (6). 

In  any  event,  to  the-  Bayesian  statistician,  the  prior  information 
represents  the  best  available  estimate  about  an  uncertain  quantity, 
regardless  of  its  source.  This  fact  even  suggests  that  it  is  reason- 
able and  logical  to  modify  the  prior  distribution  developed  from 
historical  data  to  reflect  the  improved  design  characteristics  of  the 
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new  system.  Suppose,  for  example,  that  one  of  the  critical  issues  being 
evaluated  during  OT  of  a new  weapons  system  is  the  accuracy  of  the 
weapon  at  a specified  range . The  distribution  of  the  mean-error  of 
similar  weapons  currently  in  use  can  be  determined  from  historical 
data.  If  the  new  system  is  expected  to  be  significantly  more  accurate 
because  of  new  design  characteristics , the  mean  of  the  prior  distri- 
bution developed  from  the  historical  data  can  be  adjusted  to  reflect 
the  expected  increase  in  the  performance  of  the  new  system.  In  dis- 
cussing techniques  for  the  assessment  of  prior  distributions  and  the 
use  of  diffuse  prior  distributions  to  represent  the  situation  where  no 
prior  information  is  available,  Winkler  (l6)  states: 

It  should  be  stressed  that  in  general,  there  is  no  such  thing  as 
a "totally  informationless"  situation  and  the  use  of  particular 
distributions  to  represent  diffuse  prior  states  of  information  is 
a convenient  approximation  that  is  applicable  only  when  the  prior 
information  is  "overwhelmed"  by  the  sample  information.  In  most 
real-world  situations,  non-negligible  prior  information  (non- 
negligible  relative  to  the  sample  information)  is  available,  and 
the  concept  of  a diffuse  prior  distribution  is  not  applicable. 

The  Basic  Alternatives  of  Determining  Sample  Size 
This  study  considers  only  two  basic  approaches  to  determining 
the  appropriate  sample  size  in  an  experimental  process.  One  approach 
is  to  simply  disregard  any  prior  knowledge  or  information  available 
about  the  variable  of  interest,  and  use  classical  statistical  techniques 
to  solve  the  problem.  The  other  approach  is  to  combine  the  prior 
information  with  the  results  of  a limited  number  of  replications  of 
the  experiment,  if  possible,  and  then  use  these  results  to  solve  the 


problem. 
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The  Classical  Method 

Classical  estimation  procedures  and  techniques  are  well  documented 

in  the  literature  (9,  10,  12).  This  method  uses  only  the  results  of 

the  sampling  or  experimental  process  in  the  estimation  procedures  and 

ignores  all  prior  information.  Starting  from  the  basic  assumption  that 

the  sampling  process  is  normally  distributed  with  unknown  mean,  p,,  and 

2 

unknown  variance,  a , the  random  variable  representing  the  outcome  of 
the  sampling  process  can  be  represented  by: 

2 2 
X.  ~ N (p. , a ),  with  p,,  o unknown 


Let  (X^ , Xg,  . . . , Xn)  represent  the  results  of  n replications 
of  the  experiment.  The  smiple  statistics  based  on  the  specific  n 
values  obtained  from  the  sampling  process  can  be  expressed  as: 

n 

— i v 

X = — ) X . , the  s ample  mean 

n « i i 

i=l 

and 

I <xi  - x>2 

2 i = 1 

S = — , the  sample  variance 


The  appropriate  expression  for  a (l  - a)  percent  confidence 
interval  about  the  unknown  mean,  p,,  for  a process  which  is  normally 
distributed  and  for  which  the  variance  is  unknown  is  constructed  using 
the  Student’s  t distribution,  i.e.; 
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P(X  - t(a/2,n-l)  — £ p,  £ X + t(a/2,n-l)  — ) = l-a  (2-l) 

J~n  j~a 

where  the  expression  t(a/2,n-l)  refers  to  the  percentage  points  of  the 
Student's  t distribution  with  n-1  degrees  of  freedom  such  that  P(t  > 
t(cr/2,n-l)  = a/2. 

Recall  from  the  introduction  that  the  classical  interpretation 
of  probability  differs  considerably  from  the  Bayesian  interpretation. 
Thus,  the  interpretation  of  equation  (2-1)  is  based  on  long-run  con- 
siderations. That  is,  the  classical  statistician  would  say  that  if  a 
confidence  interval  based  on  a sample  size  of  n is  constructed  each 
time,  ihen  in  the  long  run,  l-a  percent  of  such  intervals  would  contain 
the  true  mean  of  the  normally  distributed  sampling  process.  The  value 
of  a,  which  is  preselected  at  some  low  value,  can  then  be  thought  of 
as  protection  against  failure  of  the  interval  to  include  the  true 
value  of  the  mean  of  the  sampling  process.  The  value,  a = 0.05,  is 
often  selected  for  statistical  inference  and  estimation  problems  because 
of  traditional  useage.  The  second  type  of  error  that  can  occur  in 
interval  estimation  problems  is  that  the  interval  constructed  based 
on  a set  of  specific  sample  results  may  to  too  wide,  even  though  the 
interval  does  include  the  true  value  of  the  mean  of  the  sampling  pro- 
cess. This,  then.,  is  a problem  of  the  accuracy  associated  with  the 
confidence  interval.  Protection  against  this  type  of  error  is  accom- 
plished by  controlling  the  width  of  the  confidence  interval  constructed. 
The  width  of  each  specific  confidence  interval  is  dependent  on  the 
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sample  size  and  the  value  of  a specified. 
The  terms 


and 


UL  = X - t (a/2 , n-1)  ~ 

•J  n 


UU  = X + t (a/2,  n-1)  — 

J n 


which  are  real-valued  functions  of  the  sample  results,  are  the  lower 
and  upper  limits,  respectively,  of  the  interval  estimate.  The  Student's 
t distribution  is  very  similar  to  the  standard  normal  distribution,  and 
for  degrees  of  freedom,  v = n-1  > 20,  the  two  distributions  are  virtually 
indistinguisable . And  in  fact,  the  Student's  t distribution  is  identical 
to  the  standard  normal  distribution  for  degrees  of  freedom,  v = oo  (10). 
This  fact  allows  accurate  approximations  in  computing  the  minimum 
sample  size  by  approximating  the  value  of  t (a/2, n-1 ) by  t(a/2,  °°)  = 

^a/2  for  avderate  sample  sizes.  The  experssion  Za/2  refers  to  the 
percentage  points  of  the  standard  normal  distribution  such  that 
P(Z  > Zo/2)  = a/2. 

For  the  moment,  let  the  preselected  width  of  the  confidence 
interval  be  simply  equal  to  k.  Then  from  equation  (2-1),  the  half- 
interval  width  can  be  expressed  as: 


t (a/2,  n-1)  = | 

J n 
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Solving  this  equation  for  n,  results  in  the  following  expression  for 
the  minimum  sample  size  required  for  a confidence  interval  width  equal 
to  k. 


n-* 


f 2t(o/2,  n*c-l)S 
= [ k 


(2-2) 


It  is  more  convenient  to  express  the  width  of  the  confidence  interval 
in  terms  of  the  sample  standard  deviation  in  order  to  simplify  equation 
(2-2).  Thus,  if  k = 6S  is  substituted  into  the  equation,  the  minimum 
sample  size  required  can  then  be  expressed  as: 

2t(o/2,  n*  -1) 

n*  = 5 

c L K 

Equation  (2-3)  cannot  be  solved  explicitly  for  n*c,  since  the 
value  of  t(a/2,n*c-l)  is  a function  of  the  sample  size  n*c>  But  since 
the  value  of  t(a/2,  n*c~l)  is  approximately  equal  to  t(<*/2,  a>),  which 
is  equal  to  Z<>/2,  for  moderate  sample  sizes  a good  first  approximation 
for  the  solution  of  equation  (2-3)  is  obtained  by  substituting  the 
value  of  Zqc/2  for  the  value  t(a/2,  n*  -1 ) . This  first  approximation 
is  known  to  be  too  small,  although  for  large  sample  sizes  it  is  quite 
close  to  the  actual  value  of  n*c . Using  this  first  approximation, 
call  it  nQ,  to  evaluate  t(Q/2,n0-l)  and  to  solve  equation  (2-3)  again, 
to  obtain  a better  second  approximation  for  the  value  of  n*c>  This 
iterative  procedure  can  be  used  to  approximate  the  value  of  n*c  to  any 
desired  accuracy;  however,  there  is  usually  no  significant  improvement 
in  the  approximation  after  the  second  or  third  iteration. 


]“ 
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Table  1 shows  the  values  of  n*c  obtained  for  a 95  percent  (<*  = 
0.05)  confidence  interval  for  various  values  of  6 using  this  iterative 
procedure.  Because  of  the  premimum  placed  on  accurate  estimates  in 
Operational  Testing,  values  of  6 > 1.0  were  not  considered.  The  values 
shown  in  the  table  under  the  heading  P(K)  are  the  approximate  proba- 
bilities of  a single  observation  from  the  sampling  process  falling 
between  the  lower  and  upper  limits  of  the  confidence  interval,  i.e., 
P(K)  = P(i£^  < x < Utf).  This  value  gives  a probabilistic  measure  of 
the  accuracy  (width)  of  the  confidence  interval.  The  values  of  n*c 
in  the  table  have  been  rounded  up  to  the  next  highest  integer.  As 
illustrated  in  Table  1,  equation  (2-3)  points  out  that  in  order  to 
decrease  the  width  of  a confidence  interval  by  one-half,  the  sample 
size  must  be  increased  approximately  by  a factor  of  four. 


Table  1.  Minimum  Sample  Size  - Classical  Method 


6 

P(K) 

n* 

c 

1.0 

0.383 

18 

0-9 

0.347 

22 

1.8 

0.311 

27 

0.7 

0.274 

34 

0.6 

0.236 

46 

0.5 

0.197 

64 

0.4 

0.159 

99 

o.3 

0.119 

174 

0.2 

0.080 

387 

0.1 

o.o4o 
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A Bayesian  Approximation 

Bayes  Theorem  for  Continuous  Random  Variables.  The  essence  of 
Bayes  Theorem  for  continuous  random  variables  is  depicted  in  Figure  1 
shown  below.  The  densities  f(s)  and  f(0ly)  represent  the  prior  dis- 
tribution and  the  posterior  distribution  respectively,  and  f (y | 0 ) 
represents  the  likelihood  or  sampling  function.  It  is  important  to 
keep  in  mind  always  that  it  is  the  prior  distribution  or  the  statisti- 
cians prior  state  of  knowledge  that  is  modified  by  the  sampling  results 
and  not  the  reverse. 


e 

f(0) 


Sample 

Information 


(y) 


f(y|e) 


f(8|y) 


Figure  1.  Bayes  Theorem  for  Continuous  Random  Variables 


The  prior  and  posterior  distributions  must  be  proper  density 
functions.  That  is,  they  must  possess  the  following  mathematical 
properties  applicable  to  the  density  function  of  any  continuous  random 
variable,  x,  which  has  range  space  or  domain,  Rx: 

(i)  f(x)  i 0 for  all  xeR 

2C 


f (x)dx  = 1 
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The  likelihood  or  sampling  function,  f(y|9)>  represents  the  probability 
of  obtaining  a given  value,  y,  for  the  range  of  possible  values  of  0. 
The  likelihood  function  is  not  a proper  density  function  because  the 
events  f(y|0)  are  not  mutually  exclusive  over  the  range  of  0. 

As  suggested  in  Figure  1,  Bayes  Theorem  is  essentually  a process 
of  combining  the  prior  distribution  with  the  sample  information  to 
yield  the  posterior  distribution.  The  resultant  posterior  density  has 
the  following  form: 

f(9|y>  = (2-4 

j°°  f(e)  f(y|e)de 

-CD 

This  result  can  be  expressed  in  words  as: 


posterior  density  = 


T normalizing 
L constant 


J 


prior  9 [" 
density  j L 


likelihood  “] 
function  j 


where  the  normalizing  constant,  1/  j ^f (© )f (y |e)d0,  is  needed  to  make 
the  posterior  distribution  a proper  density  function. 

Before  the  advent  of  the  high  speed  computer  which  greatly  eased 
the  computational  burden  involved  with  numerical  integration  techniques, 
application  of  equation  (2-4)  to  revise  density  functions  in  the  light 
of  sample  information  often  proved  extremely  difficult  because  of  the 
integration  required  to  compute  the  normalizing  constant.  For  this 
reason,  Bayesian  statisticians  developed  the  concept  of  "conjugate" 
distributions,  which  are  families  of  distributions  that  ease  the  compu- 
tational burden  when  they  are  used  as  prior  distributions  (l6).  Of 


20 


course  the  resultant  form  of  the  posterior  distribution  depends  on  the 
likelihood  function  as  well  as  the  prior  distribution.  Thus,  conjugate 
prior  distributions  are  selected  on  the  basis  of  the  statistical  pro- 
perties of  the  model  chosen  to  represent  the  sampling  process.  When 
the  prior  distribution  is  conjugate  to  the  likelihood  or  sampling 
function,  the  resultant  posterior  distribution  is  also  a member  of 
the  ame  conjugate  family  of  prior  distributions . 

Bayes  Theorem  for  Normal  Distributions.  If  it  is  possible  to 
model  the  population  or  process  being  sampled  by  a normal  distribution, 
the  proper  choice  for  a family  of  conjugate  prior  distributions  depends 
on  the  statistician's  knowledge  of  the  parameters  of  the  normal  data 
generating  process  used.  Raiffa  and  Schlaifer  (13)  summarize  the 
effects  of  the  statistician's  knowledge  of  the  two  parameters  of  the 
normal  distributions  on  the  proper  choice  of  conjugate  prior  distri- 
butions as  fellows: 

2 

Case  (1)  u known,  o unknown:  The  appropriate  famile  of  conjugate 

distributions  have  a gamma -2  density. 

2 

Case  (it)  g known,  q unknown:  The  appropriate  famile  of  conjugate 

distributions  have  a normal  density. 

2 

Case  (iii)  both  u and  g unknown:  The  appropriate  family  of  conju- 

gate distributions  have  a normal -gamma  density. 

An  Approximation  Procedure.  Since  it  was  assumed  that  within 

the  context  of  this  study  .the  model  representing  the  sampling  process 

in  Operational  Testing  was  normally  distributed  with  unknown  mean,  y, 

2 

and  unknown  vai ranee , o , the  appropriate  family  of  conjugate  distri- 
butions to  use  in  this  case  have  a normal -gamma  density.  In  order  to 


overcame  the  obvious  difficulties  associated  with  computing  interval 
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estimates  with  the  normal-gamma  density,  a procedure  is  suggested  here 
to  modify  the  Bayesian  analysis  of  this  sampling  process  so  that  the 
family  of  conjugate  prior  distributions  have  a normal  density  function; 
as  is  the  case  when  the  variance  of  the  population  or  sampling  process 
is  known. 

Assume  for  the  moment  that  the  variance  of  the  sampling  process 
is  known.  Then  the  conjugate  prior  distribution  has  a normal  density 
function  of  the  form: 


f'(u) 


1 e-<n  - m')2/2o'2 
V 2tt0'2 


where  the  prime  (’)  is  used  to  signify  a parameter  or  constant  which 

2 

is  associated  with  the  prior  distribution.  Thus,  a'  is  the  variance 

of  the  prior  distribution  or,  the  prior  variance  of  the  unknown  para- 

# 

meter,  u;  and  ra'  is  the  mean  of  the  prior  distribution  of  this  para- 


meter. 

If  n replications  of  the  experiment  were  now  conducted  and  a 
sample  mean. 


m 


n 

r 

L 

i=i 


» 


and  a sample  variance , 


n 


o _ 


Si,  (xi  - - 

i=l 


were  observed,  the  resultant  posterior  distribution  would  also  have  a 
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normal  density  function  of  the  form: 

f"(i*|y>  = e ' (“  - m"> 

V 2tto" 

where  y represents  the  sample  results,  and  the  double  prime  (")  is 
used  to  indicate  a parameter  or  constant  which  is  associated  with  the 
posterior  distribution.  Thus,  a"  is  the  posterior  variance  of  p,, 
and  b"  is  the  mean  of  the  posterior  distribution  of  p,.  These  posterior 
parameters  can  be  computed  from  the  following  formulas: 


and 


«.  _fl/g,2)n,  + (n/q2 )m 
(l/a'2)  + (n/a2) 


(2-5) 


(2-6) 


Equations  (2-5)  and  (2-6)  indicate  that  the  reciprocal  of  the 

posterior  variance  is  equal  to  the  sum  of  the  reciprocal  of  the  prior 
2 

variance,  o'  , and  the  reciprocal  of  the  variance  of  the  sample  mean, 

p 

a / n.  The  posterior  mean  is  a weighted  average  of  the  prior  mean,  m’, 
and  the  sample  mean,  m.  The  weights  being  the  reciprocal  of  the  res- 
pective variances. 

As  depicted  in  Figure  2.,  an  important  feature  of  the  posterior 
distribution  is  that  the  posterior  mean,  m",  always  lies  between  the 


2 

prior  mean,  m’,  and  the  sample  mean,  m.  The  posterior  variance,  a"  , 


23 


2 

is  always  smaller  than  the  prior  variance,  c*  (l6).  From  equation 

o 

(2-5),  if  the  variance  of  the  prior  distribution,  a**-,  decreases,  the 
amount  of  prior  uncertainty  decreases,  and  the  prior  information  is 
given  more  weight  in  the  determination  of  the  posterior  distribution. 
Similarly,  as  the  variance  of  the  sample  mean,  a /n,  decreases,  the 
sampling  information  is  given  more  weight  in  the  determination  of  the 
posterior  distribution. 


Figure  2.  Bayes  Theorem  for  Normal  Distributions 


A different  parameterization  of  this  problem  might  help  clearify 
the  results  obtained. 


2 

let  n'  = ~ 
o' 


Then  the  prior  variance  can  be  written  in  terms  of  n*  and  the  process 
or  sampling  variance , thus: 
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Sirilarly,  if 


then 


* 


Substituting  these  results  into  equations  (2-5)  and  (2-6),  the  para- 
meters of  the  posterior  distribution  are  then 


or  simply, 


n"  = n'  + n 


and 


= (n'  + (n/g2)m 

(n'/o2)  + (n/a2) 
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or  simply. 


m' 


n'm1  + nm 
n*  + n 


(2-10) 


In  his  interpretation  of  the  results  obtained  by  using  these 

new  parameters,  Winkler  (16)  suggests  that  the  prior  distribution  can 

be  thought  of  as  roughly  equivalent  to  the  information  contained  in  a 

sample  of  size  n’  with  a sample  mean  of  m'  from  a normal  sampling  pro- 

2 

cess  with  variance  a . That  is,  n’  appears  to  be  the  sample  size  f 

2 

required  to  produce  a variance  of  o'  for  a sample  mean  equal  to  m', 
since  the  variance  of  the  sample  mean  from  a sample  size  n'  is  equal 
to  a /n'.  Winkler  also  considers  equations  (2-7)  and  (2-3)  as  formulas 
for  pooling  the  information  from  the  two  samples.  Under  this  inter- 
pretation, the  posterior  or  pooled  sample  size  is  equal  to  the  sum  of 
the  two  individual  somple  sizes,  one  from  the  prior  distribution 
and  one  from  the  sampling  process.  The  posterior  or  pooled  sample  mean 
is  equal  to  a weighted  average  of  the  two  individual  sample  means . 

This  pooling  process  suggests  that  a reasonable  estimate  of  the 
sample  mean,  based  on  all  the  information  available,  is  the  posterior 
or  pooled  mean,  m".  Notice  that  if  n’  > n,  then  the  posterior  or  pooled 
mean  is  closer  to  the  prior  mean  than  to  the  sample  mean.  That  is, 
the  prior  information  is  given  more  importance  than  the  sample  results 
in  the  determination  of  the  posterior  parameters.  Of  course,  the 
posterior  mean  is  closer  to  the  sample  mean  if  n > n' ; and  if  n'  = n, 
the  posterior  mean  is  exactly  midway  between  the  prior  mean  and  the 
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sample  mean.  Notice  also  that  since  the  sample  mean,  m,  is  as  equally 
likely  to  fall  above  as  it  is  to  fall  below  the  true  population  or 
sampling  mean,  [i;  it  is  then  equally  likely  that  the  sample  mean  and 
the  mean  of  the  prior  distribution,  m' . to  be  on  the  same  or  opposite 
sides  of  p,.  When  m'  and  m fall  on  the  same  side  of  |j,,  the  mean  of  the 
posterior  distribution,  m",  will  be  further  from  jj,  thao  the  sample  mean. 
That  is,  the  posterior  mean  will  be  a less  accurate  estimate  of  the  true 
population  mean  than  the  sample  mean.  When  ra'  and  m are  on  opposite 
sides  of  p,,  then  it  cannot  be  determined  whether  the  posterior  rn.can  will 
be  closer  or  further  from  u than  the  sample  mean.  Each  specific  case 
must  be  examined  separately;  the  results  will  depend  on  the  sample 
size,  the  specific  value  of  the  prior  mean,  and  the  variances  of  the 
prior  and  sampling  distributions . 

Since  the  point  estimate  of  (j,  based  on  all  information  available 
is  the  posterior  mean  which  is  normally  distributed  with  mean,  m",  and 
variance,  a”  , the  statistic 


ff 

r.  m - n 

* = ~~cT 

has  a standard  normal  distribution,  i.e.,  Z ~ N(0,  l).  Therefore  the 
appropriate  expression  for  a (l  - a)  percent  interval  estimation  of  p, 
for  this  case  is  constructed  using  the  standard  normal  distribution, 

l .v».  . 


P(m"  - Z 


a/2 


<:  u £ m"  + Z /o  cr")  = 1 - 

at/ d 


a 


(2-11) 


The  lower  and  upper  limits  of  the  confidence  interval  in  this  case  are 
UL  = m"  - and.  UU  = ?’a/2  a"s  respectively. 

If,  as  was  done  in  the  classical  case,  the  width  of  the  con- 
fidence interval  for  the  general  case  is  set  equal  to  k,  then  from 
equation  (2-11)  the  half-interval  width  can  be  expressed  as: 


Za/2  X a"  = I 


2 

Now  substituting  the  expression  for  o"  from  equation  (2-5)  into  the 
above  expression  results  in  the  following: 


*/2  t l/a'2\  „/a2  ] = * 


,2  2 

* 1 fW 


r a'~ a"  t . k 

°/Z  L a2  ♦ no'2  J = 2 


[ 


Za/2  T 2 2 ,2 

k7§ J “ a + na 


and  finally, 


2 

a 


(2-12) 


This  then,  is  the  Bayesian  solution  for  the  minimum  sample  size 
required  to  establish  a confidence  interval  of  width  k about  the  mean 
of  the  sampling  process  under  the  special  condition  that  the  variance 
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of  the  sampling  process  is  known.  Several  characteristics  of  equation 

(2-12)  deserve  mention.  First  of  all,  the  first  term  in  the  equation, 

[2  « /kf  , is  in  fact  the  exact  expression  for  the  classical  solution 

to  the  problem  of  determining  the  minimum  sample  size  required  to 

establish  a confidence  interval  of  width  k about  the  mean  of  a sampling 

process  with  known  variance.  Second,  the  last  term  in  the  equation, 

2 2 

a fa'  , is  the  expression  developed  earlier  for  n’  in  equation  (2-7). 

Recall  Winkler’s  interpretation  of  n’  as  being  roughly  the  equivalent 

sample  size,  relative  to  the  sampling  process,  of  the  information  con- 

tained  in  the  prior  distribution.  The  ratio  a /cr'  = 0 is  also  used  to 

define  a diffuse  prior  distribution,  i.e.,  an  informationless  prior 

2 

state.  Assuming  that  the  variance  of  the  sampling  process,  a > 0, 

2 2 

the  ratio  a /a’  = 0 only  if  the  variance  of  the  prior  distribution, 

2 

a*  = 03.  In  this  case,  the  variance  of  the  prior  distribution  would 

2 2 

represent  a condition  of  total  uncertainty  and  since  n'  = a /a1  = 0, 

equation  (2-12)  would  yield  the  same  results  as  in  the  classical  case. 

Tying  all  these  facts  together,  equation  (2-12)  can  be  inter- 
preted as  follows:  the  minimum  Bayesian  sample  size  required  to  esta- 

blish an  interval  estimation  of  the  mean  cf  any  specified  width  or 
accuracy  is  equal  to  the  minimum  sample  size  required  to  establish 
the  same  interval  estimation  using  classical  methods,  minus  the  value 
of  the  prior  information  in  terms  of  an  equivalent  sample  size.  Or, 


= n* 
c 


- n’ 


more  clearly: 
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How,  consider  once  again  equation  (2-12)  in  order  to  address  the 
fact  that  the  variance  of  the  sampling  process  is  in  fact  not  known. 
Substituting  the  sample  variance  for  the  variance  of  the  sampling  pro- 
cess and  the  term  t (a/2,  n*c  - l)  for  Z i»  equation  (2-12),  and  once 
again  defining  the  width  of  the  confidence  interval  as  k = 6S,  results 
in  the  following  expression  for  the  approximate  Bayesian  sample  size: 


where 


r ^ ^»/2*  n*c  ” ^ *1  ^ 

“*b  - [ ' 6~  ~ 


r §L 
J ' ..2 


n 


l <xi  - 


,2  i=l 


n*  - 1 


(2-13) 


and  m is  equal  to  the  sample  mean  based  on  n*b  observations. 

Examination  of  equation  (2-13)  reveals  that  the  first  term  in 

the  equation  is  identical  to  equation  (2-3),  the  classical  solution 

to  the  minimum  sample  size  problem  for  a normal  sampling  process  with 

unknown  variance.  The  last  term  in  the  equation  is  an  approximation 

of  the  equivalent  sample  size  of  the  Information  contained  in  the  prior 

distribution,  where  the  value  of  n*  = a /o'  is  approximated  by  n'  = 

5/a ’ “ . Of  course  equation  (2-13)  cannot  be  evaluated  explicitly,  even 

though  the  value  of  the  first  term  in  the  equation  is  exactly  known 

from  the  results  obtained  using  the  classical  method,  since  the  value 
2 

of  S depends  on  the  specific  observations  obtained  during  the  sampling 


process . 


■■■iM 
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Before  suggesting  a procedure  for  approximating  a solution  to 
equation  (2-13)  for  the  general  case,  it  may  be  more  appropriate  at 
this  point  to  examine  the  general  implications  of  using  the  posterior 
distribution  to  construct  cofidence  interval  estimates  about  the  mean 
of  the  sampling  process.  An  interval  estimation  based  on  the  posterior 
distribution  has  as  its  midpoint  the  posterior  mean,  m";  while  the  mid- 
point of  an  interval  estimation  based  on  the  sampling  process  alone  is 
the  sample  mean,  m.  Referring  to  Figure  2,  it  is  obvious,  then,  that 
an  interval  estimate  of  width  6S  which  is  based  on  the  posterior  dis- 
tribution will  not  include  the  sample  mean,  m,  if  m"  and  m are  separated 
by  more  than  ^AS.  A large  separation  between  m"  and  m is  indicative  of 
prior  information  which  is  not  very  compatible  to  the  results  obtained 
from  the  sampling  or  experimental  results.  In  other  words,  the  prior 
information  doe3  not  predict  the  behavior  of  the  sampling  process  very 
well.  This  is  an  important  consideration  in  Operational  Testing,  since 
it  is  important  to  decide  whether  or  not  to  use  the  prior  information 
in  estimating  the  mean  of  the  sampling  or  experimental  process. 

It  would  seem  appropriate  then,  to  develop  at  least  a heuristic 
rule  to  reject  the  use  of  prior  information  which  causes  the  posterior 
and  sampling  means  to  differ  beyond  some  pre-established  limit.  The 
general  form  of  such  a rule  would  be  of  the  form: 

jm"  - m|  £ q6S 


where  the  valu«  of  q is  sleeted  in  a manner  such  that  if  the  inequality 
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were  not  satisfied,  the  application  of  Bayesian  techniques  would  be 
aborted  and  the  appropriate  sample  size  for  the  specific  situation  would 
be  determined  by  using  classical  techniques.  ' ^ ' 

Returning  now  to  the  problem  of  constructing  an  interval  estimate 
of  width  6S  for  the  mean  of  the  sampling  process  using  Bayesian  techni- 
ques, the  following  procedure  is  suggested  as  a reasonable  approach  to 
approximating  the  solution  of  equation  (2-13)  for  the  general  case. 

a.  Determine  the  minimum  sample  size,  n*c>  required  for  the 
classical  method.  This  value,  call  it  n^,  is  the  upper  limit  of  the 
Bayesian  sample  size. 

b.  A*?  a first  approximation  to  the  Bayesian  sample  size,  let 
n^  = n^/d.  Where  the  value  of  d is  selected  with  consideration  given 
to  th<;  classical  sample  size  being  used.  That  is,  for  small  values  of 
n*c,  d should  be  chosen  at  some  low  value  (such  as  2 or  4)  in  order  that 

be  large  enough  to  yield  suitable  sample  statistics.  For  large 
values  of  n*c,  d nay  be  increased  since  the  resulting  samples  would 
still  yield  suitable  statistics.  The  objective  here  is  to  approximate 
the  Bayesian  sample  size  concervatively  while  insuring  that  the  approxi- 
mation decided  upon  is  large  enough  to  yield  reasonably  valid  sample 
statistics . 


c.  Conduct  the  n^  replications  of  the  experiment  and  from  the 

results  compute  the  sample  statistics: 

n 

r* 

L \ 

i=l 

"i  = — 


d.  Use  these  statistics  to  compute  the  approximations: 


and 


n'^m*  + n^m^ 
n’l  + nl 


e.  Determine  the  second  approximation  of  the  Bayesian  sample 
size  by  using  the  value  obtained  for  the  first  approximation  and  the 
following  relationship: 


*>2  = + A(nQ  - n'x) 

where  t is  chosen  with  the  same  considerations  as  was  the  value  of  d. 
The  expression  for  the  approximation  of  the  Bayesian  sample  size 
is: 

a3  ‘ nj-l  + 4(n0  ' 

f.  Determine  if  sufficient  replications  of  the  experiment  have 
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been  conducted  after  each  iteration  by  comparing  the  computed  approxi- 
mation of  the  Bayesian  sample  size  to  the  classical  sample  size  minus 
the  computed  value  of  n'.  That  is,  continue  the  iterative  procedure 
until  n^  i nc  - n'j. 

g.  After  computing  the  final  approximation  of  the  Bayesian 
sample  size,  determine  if  the  prior  information  should  be  accepted  or 
rejected.  That  is,  if  |m"  - m|  « q6S,  use  the  n^  replications  already 
conducted  to  construct  the  interval  estimate  of  the  mean  of  the  experi- 
mental process  using  Bayesian  techniques.  If  |m"  - m|  > qSS,  reject 
the  use  of  the  prior  information;  conduct  the  remaining  nQ  - n^  repli- 
cations of  the  experiment  and  construct  the  desired  interval  estimate 
of  the  mean  of  the  experimental  process  using  classical  techniques. 
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CHAPTER  III 

DEMONSTRATION  OF  THE  METHODOLOGY 
Program— < np;  *he  Model 

The  model  developed  for  approximating  the  minimum  Bayesian  sample 
size  for  the  special  test  situation  described  in  Chapter  I is  programmed 
for  the  UNIVAC  1108  computer  using  standard  Fortran  IV  language.  The 
program  consists  » f four  basic  segments  designed  to  perform  the  follow- 
ing functions:  generate  the  required  data  and  compute  the  sample 

statistics;  compute  the  classical  sample  size  required  for  an 

interval  estimation  of  specified  width;  compute  the  approximate  Bayesian 
casple  size  required  for  the  same  interval  width;  and  construct  the 
confidence  intervals  desired  based  on  the  sampling  results . 

The  Box  and  Mueller  technique  (7)  is  used  to  generate  the  normally 
distributed  pseudo  random  numbers  representative  of  a normal  process 
with  specified  mean  and  variance.  The  random  number  generator  was  tested 
for  various  sample  sizes  and  values  of  the  model  parameters  using  the 
chi-square  goodness -of-f it  test  for  normality.  The  results  of  these 
tests  were  quite  favorable  and  are  summarized  in  Table  2. 

Equation  (2-3)  is  solved  iteratively  for  the  minimum  classical 
sample  size  by  using  two  standard  UNIVAC  MATH-STAT  library  functions 
U4).  The  function  TINOHM  is  used  to  compute  the  value  of  the  inverse 
of  the  standard  normal  distribution  given  the  value  of  the  probability 
for  which  the  ordinate  is  to  be  calculated.  The  function  STUDIN  is 


Table  2.  Teat  of  iicraol  Random  ienorator 
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used  to  calculate  the  inverse  of  the  Students  t distribution  for  a 
j'iven  confidence  coefficient.  The  results  obtained  from  the  subroutine 
used  to  calculate  the  classical  sample  size  for  each  specified  value  of 
6 are  shown  In  Table  1. 

Approximations  for  the  Bayesian  sample  size  for  a given  value  of 
delta  are  computed  using  the  iterative  procedure  developed  in  the  pre- 
ceding chapter.  The  value  of  the  classical  sample  size  computed  for  a 
given  value  of  dleta  is  Input  to  this  subroutine  which  uses  this  value 
to  calculate  the  first  approximation  of  the  Bayesian  sample  size. 

Confidence  intervals  are  computed  by  using  the  STUDIN  library 
function  to  calculate  the  value  t(a/2,  n*-l),  where  n*  is  the  computed 
classical  or  Bayesian  sample  size.  The  subroutine  then  computes  the 
lower  and  upper  limits  of  the  confidence  interval,  i.e., 

S 

UL  « a - t (a/2 , n*  - l)  c- 

and 

S 

00  = m + tta/2,  n»  - 1)  — — 

•/“'c 

for  the  classical  case,  and 

S 

UL  = m"  - t(o/2,  n*  - l)  — 

b 


and 


38 


UU  = m"  + t(o/2 , n*b  - l)  — — 

Jo" 

for  the  Bayesian  case. 

Demonstrating  the  Model 

In  order  to  demonstrate  the  model  developed  to  approximate  the 
Bayesian  sample  size  in  the  preceding  chapter,  various  values  of  the 
constants,  d,  &,  end  q used  in  the  iterative  procedure  were  tried  in 
preliminary  simulations.  The  values  d « 4,  A = l/k,  and  q « 3/8  were 
chosen  for  the  following  reasons: 

a.  Values  of  d < 4 tended  to  produce  first  approximations  of 
the  Bayesian  saaple  size  which  were  too  large  when  working  with  small 
values  o'  the  classical  saaple  size,  nQ.  That  is,  n^  * nQ  - n’^  after 
the  first  approximation.  Larger  values  of  d produced  more  conservative 
first  approximations  of  the  Bayesian  saaple  size  for  small  values  of 
n^,  but  at  the  same  time  resulted  in  unreliable,  i.e.,  greatly  variable, 
saaple  statistics. 

b.  Values  of  A < 1/4  were  rejected  because  for  large  values  of 
nQ  the  number  of  iterations  required  to  compute  the  approximate  Bayesian 
saaple  size  was  considerably  increased.  It  was  felt  that  this  result 
was  undesirable  in  an  Operational  Testing  mode  and,  of  course,  it  also 
meant  increased  computer  times  to  solve  the  approximation . A scheme 

of  using  a variable  value  for  A was  tried,  i.e.,  A was  decreased  by 
one-half  after  each  iteration.  This  scheme  was  also  rejected  because 
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for  larger  values  of  n„  the  iterative  procedure  quickly  evolved  into 

O 

a sequential  type  of  sampling  procedure. 

c.  The  value  of  q = 3/8  was  selected  as  a reasonable  choice 
based  on  the  illustration  shown  in  Figure  3-  The  interval  a-d  repre- 
sents an  interval  estimation  based  on  the  posterior  distribution.  Then 
from  previous  definitions,  a-d  * 6S,  and  the  intervals  a-m"  ■ nT'-d  = 
l/2  6S.  Then  if  the  intervals  a-b  = c-d  = l/8  6S,  the  sample  mean,  m, 
is  required  to  be  within  the  interval  b-c  = 3/*+  6S,  i.e. . |m"  - m|  s 
3/8  6S  is  the  prerequisite  for  incorporating  the  prior  information 
into  the  estimation  procedures.  It  was  felt  that  l/8  KS  would  allow 
for  sufficient  variation  of  the  sample  mean  due  to  differences  in 
sample  results. 


Figure  3.  Separation  of  the  Posterior  and  Sample  Means 


Ho 


The  procedure  to  approximate  the  Bayesian  sample  size  was  demon- 
strated using  a hypothetical  case  having  the  following  characteristics: 

a.  the  ratio  /o*^  = l6. 

b.  |m*  - uj  = 5 and  Jm'  - pj  = 10. 

2 

where  p,  and  a are  the  true  (but  assuaed  unknown)  values  of  the  para- 

2 

oeters  of  the  sampling  process  and  m'  and  o’  are  the  parameters  of  the 
prior  distribution. 

The  first  test  of  the  procedure  involved  a computer  simulation 
of  100  runs  for  each  value  of  delta  from  1.0  to  0.2.  The  model  was  not 
tested  for  the  value  of  delta  equal  to  0.1  in  this  or  subsequent  tests 
of  the  procedure  because  the  large  sample  sizes  involved  required  an 
excessive  amount  of  c unput er  time.  The  results  of  this  first  test  are 
suan&rized  in  Table  3 for  the  case  where  |a'  - uj  » 5 and  in  Table  4 
for  the  case  where  |m'  - pj  = 10.  These  results  appear  quite  favorable 
as  shown  in  the  percentage  of  reduction  achieved  over  the  classical 
sample  sizes.  Mote  that  the  computed  Bayesian  sample  size  does  not 
depend  cn  the  value  of  ja*  - p|.  That  is,  the  Bayesian  sample  sizes 
are  identical  in  Tables  3 and  4 for  a given  value  of  delta.  The  con- 
fidence and  accuracy  of  the  interval,  estimates  produced,,  i.e.,  the 
number  of  times  the  true  mean  of  the  sampling  process  is  contained 
within  the  interval  and  the  width  of  the  interval  constructed,  is 
comparable  to  the  results  obtained  using  classical  methods  for  the 
rase  where  |m*  - u ! = 5 • For  *he  case  where  |m’  - p J = 10,  the  desired 
confidence  is  not  achieved  until  the  situation  involving  the  two  largest 
.-.ample  sizes.  The  separation  between  the  posterior  and  sample  means 


Table  3.  Data  for  the  Bayesian  Approximation  Model  Based 
on  One  Hundred  Runs  for  Each  Value  of  Delta 
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Table  1*.  oata  for  the  Bayesian  Approximation  Model  Based 
on  jne  Hundred  Runs  for  Each  Value  of  Lelta 
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decreases  as  the  sample  size  increases  and  the  sampling  information  is 
given  more  weight  in  the  determination  of  the  posterior  distribution. 

For  this  reason,  the  test  suggested  for  determining  whether  or  not  to 
use  the  prior  information  does  not  work  well  at  all.  For  both  the  case 
where  Ja*  - ul  = 5 and  |m'  - nj  = 10,  the  test  rejects  the  prior  infor- 
mation too  often  for  small  sample  sizes  and  erroneously  allows  the  use 
of  the  prior  information  in  large  sample  sizes.  It  appears  that  a 
better  decision  rule  as  to  whether  or  not  to  reject  the  prior  information 
should  consider  the  difference  between  the  prior  mean  (rather  than  the 
posterior  mean)  and  the  sample  mean.  The  accuracy  of  the  approximation 
procedure  is  quite  good;  the  overall  average  reduction  in  the  sample 
size  for  all  values  of  delta  is  12.0  samples,  which  equates  to  approxi- 
mately 75  percent  of  the  true  difference  between  the  classical  and  the 
Bayesian  sample  sizes,  which  is  3.6  samples  fer  this  particular  case. 

The  second  test  of  the  procedure  involved  computing  the  Bayesian 
sample  size  required  for  each  value  of  delta  and  for  various  values  of 
|m'  - uj  ranging  from  one  standard  deviation  below  the  true  mean  of  the 
sampling  process  to  one  standard  deviation  above  this  value.  The  spec- 
ific values  chosen  for  jm’  — (j, ( and  the  results  of  the  test  are  shown 
in  Table  5.  The  results  obtained  when  the  value  of  Ira'  — |u  j is  within 
one-half  standard  deviation  on  either  side  of  y,  are  quite  favorable, 
with  only  three  cases -out  of  the  total  of  63  trials  where  the  Bayesian 
interval  estimate  did  not  include  the  true  value  of  the  mean  of  the 
sampling  process.  Overall,  there  were  a total  of  2k  cases,  out  of  the 
?9  total  trials,  where  the  Bayesian  interval  estimate  did  not  include 


Tabl*  5.  Data  for  the  Bayesian  Approximation  Nkdel  Based 
on  Various  Values  of  lm*  - ul 
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the  true  value  of  the  mean  of  the  sampling  process. 

The  final  test  conducted  on  the  model  was  to  fix  the  value 

jm*  - ji|  = 5 and  to  compute  the  Bayesian  sample  size  required  for  each 

2 2 

value  of  delta  and  for  various  ratios  of  the  variances,  a /o'  . The 
specific  values  chosen  for  the  ratio  of  the  variances  and  the  results 
of  the  test  are  shown  in  Table  6.  The  results  obtained  when  the  ratio 
of  the  sampling  and  the  prior  variances  was  h or  greater  are  good,  with 
only  one  case  out  of  a total  of  63  trials  where  the  Bayesian  interval 
estimate  did  not  include  the  true  value  of  the  mean  of  the  sampling 
process.  Overall,  there  were  a total  of  seven  cases  out  of  the  99 
trials  where  the  Bayesian  interval  estimate  did  not  include  the  true 
value  of  the  mean  of  the  sampling  process. 


Table  6.  Tata  for  the  Buyeisan  Approximation  Model  Based 
on  Various  Values  of  the  Ratio  of  the  Variances 
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Figures  marked  with  an  astric  (*)  indicate  cases  where  the  confidence  interval  based 
Bayesian  sample  size  did  not  contain  the  true  value  of  the  mean  of  the  sampling  proc 


CHAPTER  IV 


CONCLUSIONS  AND  RECOMMENDATIONS 
Conclusions 

The  results  of  this  study  indicate  the  following  conclusions. 

1.  The  suggested  procedure  to  approximate  Bayesian  sample  sizes 
and  construct  interval  estimates  for  the  mean  of  the  sampling  process 
should  be  used  for  the  normal  sampling  process  when  accurate  prior 
information  is  available.  That  is,  when  the  prior  mean  is  within  one- 
fialf  standard  deviation  <<f  the  true  mean  of  the  sampling  process. 

2.  In  the  worst  case,  the  procedure  will  yield  the  same  sample 
sizes  as  would  classical  techniques.  In  this  case,  the  interval  esti- 
mates should  be  based  on  the  classical  method,  since  in  essence,  the 
prior  information  has  been  rejected. 

3.  The  accuracy  and  confidence  levels  associated  with  the 
interval  estimates  based  on  the  approximation  procedure  are  comparable 
to  those  obtained  by  using  classical  techniques  if  the  prior  information 
is  accurate . 

h.  The  heuristic  rule  suggested  to  determine  whether  or  not 
to  use  the  prior  information  did  not  work  well  because  the  value 
|m"  = ml  is  a function  of  the  sample  size  as  well  as  being  a function 
■ f the  value  of  the  prior  mean,  ra’. 

5.  The  results  obtained  in  the  demonstration  of  the  procedure 
for  the  values  of  delta  selected,  indicate  that  the  procedure  to  approxi- 
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mate  the  Bayesian  sample  size  and  construct  interval  estimates  is  a 
viable  concept  which  has  direct  applicability  and  value  in  Operational 
Testing. 


Recommendations 

As  in  most  cases  involving  research  of  a limited  scope,  perhaps 
more  problems  are  unearthed  than  are  resolved  in  this  study.  The 
limited  results  obtained,  however,  show  some  merit  and  applicability 
to  Operational  Testing.  As  a matter  of  future  research  in  the  area 
covered  by  this  study,  the  following  rec onmendat ions  are  suggested. 

1.  Further  efforts  are  required  to  improve  the  iterative  pro- 
cedure used  to  approximate  the  Bayesian  sample  size.  A refined  pro- 
cedure should  take  into  account  the  need  to  treat  large  and  small 
sample  sizes  as  separate  problems.  Perhaps  the  increment  added  to  the 
approximation  at  any  specific  iteration  should  be  some  function  of  the 
number  of  iterations  already  conducted.  Care  must  be  taken,  however, 
that  any  procedure  developed  for  this  situation  be  compatible  to  the 
Operational  Testing  environment,  where  ease  of  application  and  simpli- 
city are  prime  objectives. 

2.  The  sample  standard  deviation,  S,  in  equation  (2-13),  is  the 
only  variable  in  the  equation  for  a specific  sample  size.  This  parti- 
cular random  variable  is  related  to  the  chi-square  distribution.  Per- 
haps further  work  with  this  particular  element  of  the  expression  for 
the  approximate  3ayesian  sample  size  would  lead  to  more  accurate 
approximations  of  the  equation. 

3.  A workable  decision  rule  for  determining  whether  or  not  to 
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use  the  prior  information  is  needed.  It  is  suggested  that  the  relation 
ship  between  the  prior  and  sample  means,  i.e.,  |m’  - m],  will  yield 
more  viable  results  than  the  technique  used  in  this  study.  Obviously, 
whatever  rule  is  developed,  it  must  treat  the  differences  associated 
with  large  and  small  sample  sizes  separately. 

t . There  are  obvious  limitations  in  applying  this  procedure  to 
Operational  Testing.  Although  the  procedure  holds  some  potential  of 
reducing  costs  associated  with  Operational  Testing  by  reducing  the 
ntmtber  of  replications  required  of  a specific  tests,  any  iterative 
sampling  scheme  is  inherently  difficult  and  costly  to  apply  because  of 
the  problems  involved  with  multiple  scheduling  and  set-up  costs . The 
procedure  seems  better  suited  to  those  testing  situations  where  a large 
number  cf  samples  are  required  and  the  cost  of  sampling  is  relatively 
low.  For  these  reasons,  a scheme  to  incorporate  the  concept  of  loss 
lYuictions  into  this  procedure  is  needed  before  it  can  assume  the  cloak 
of  a true  decision  making  procedure. 
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BAYESIAN  PROCEDURE 
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-Rfff.R»IS  *'A  t »• 


CO"‘ON/~NE/  Y(2,l  v,  .>.  NC-*) 
cc*— •■;*:/ tv.'O/  xye an  1 3 ' . xvar  i 3 ) 
CV-tx/thscf/  XHATC3>»  <;HAT(3i 
OY:"'*l/Flvr/  ALPHA,  "L  (3  ) , IJI  1(3) 
frv.'*r<s/«;cvFN/  NCC5),  DELTA 
COVtC.NVFK.HT/  N3C1').  NPRIMC1CU  OIFF 

CO*vr*i/'tir»F/  width  1 2 ) 

LOOPC2),  KEY,  Df-’AX 
EXTERNAL  UNIT 


*>=-AD  jN  pasic  PARAMETERS 
8 PORV.AT  C I 

".FAOt'jfPl  ALd,»A 
RFAntO.R)  NSU 

R£A)C5«$>  X**r  AN  ( 1 ) » XVARM) 

RFAbCF.A)  XVFANC?  ) « XVAR(  2 t 

<TA!?T  UP  UNIFORM  OrNFRATOR  TO  RAWDO**I2F  STARTI^O  onjWT 
or  1:  j»i,  »:3u 
O*  UN I F C A ) 
lw  CONTIN'i*- 

OC  1LO  <K«  1,  30 

RE  AO  I c ♦ R,  *ND  = 9'?'))  DELTA 


C ••****  OETFRvtnf  THr  «MN!*")v  CLASSICAL  *Avpj_r  cj.7r 

CALL  CLA5M  NCI J ) 

call  ravonid 

ucTERMINr  THE  MINIMUM  SAYFSIAN  SAMPLF  SI7F.  IF  APoPOPR  I AT T 
CALL  'TAYESC  NI1)»  N(  2 ) ) 

CnvouTF  CONFIDENCE  INTFRVALS  FOR  the  DATA  ppocrccrc 
CALL  CONFIDC  N { 3 1 » 3 I 

call  ORDE»m 

CALL  C0‘-FJD{  NCI),  1 ) 

PRINT  OUTPUT 
CALL  OUTPUT 


100  Cn\T  I N’Jr 

9P?  CONTINUE  ' 

>'RI  TF  C 6 ,?r  1 
73  FORMAT C!H1) 

STOP 

’ Copy  ovciloblo  10  DZC  docs  not 

permit  fully  legible  reproduction 


r\r\r\r\r\r\r\r\r\r>r\ 
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— RFCR  * I S RANON 


'THIS  SUBROUTINE  GENERATES  NORMAL  LY  O I STP I FU'TFO  PS‘-tip*-R  ANDOM 
NUMBERS  HAVING  A SPECIFIED  MF AN  AND  VARIANCF 

ARC.liPMENT  OFF  I N t T I ON 

X IS  THE  ARRAY  OF  RANDOM  NUMBERS  (OUTPUT) 

N IS  THE  KUCHER  OF  RANDOM  NUMRFRS  DESIRED  UNPi’T) 

X«EAN  IS  THE  MFAN  of  THE  RANDOM  NUMBERS  (INPUT) 

XVAR  IS  the  VARIANCF  OF\THE  RANDOM  NUMBERS  (INPUT) 

This  SUBROUTINE  USES  THF  BOX  AND  MUeLlFP  MPTHOD  F^R 
GrNFRAT ION  OF  NORMAL  PSFUDO-PANDOV  NU^RPRS 


SUBROUTINE  RANDN(J) 

COMVON/ONF/  X(2«10O0).  N!)l 
COMVQN/TWO/  XV£ANM)»  XVARO) 

external  unif 

TPI  *6«2831fl*’2 
DC  1-0  1 = 1,  N ( J ) » 2 
A*  UNIF ( 1) 

B=  UN  I F ( 2 ) 

X(I.I)=  XVFAN(?)+  SORT (~?»0*XVARr  ? (*ALO0( A) )*COS(TPI ■ P ) 

x(2«n*  xu, n 

n=  i ♦ l 

X(l«in«  XMEAN  ( 2 ) ♦•  SORT  l-2.0«XVAR  ( 2 ) *ALOG(  A)  I *cj  rn  TPI  «P  1 
X ( 2 « I I )=  X(1,II  J 
10O  CONTINUE 

RETURN 

fnd 


-RFOP.IS  UNIF 


FUNCTION  'INIF(A) 

DATA  1Y/9A5B1/ 

I Y= I Y*31 25 
I F ( I Y ) 5 *6 ,6 

5 IY=IY*1+343S973B367 

6 YFL  = l Y 

UNIF  = yFL*?,0«*< -35  ) 


RrTU\N 

END 


nnf,nnr\r\rinr» 
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-RFOR,IS  OROER 


THIS  SUBROUTINE  SORTS  A GIVEN  SET  OF  DATA  FROM  THC  LOWEST 
VALUE  TO  THE  HIGHEST,  AND  COMPUTES  THE  SAMPLE  STATISTICS 
IMEAN  AND  STANDARD  DEVIATION!  OF  THE  DATA  PROCESS 

ARGUEVENT  DEFINITION 

X*  THF  ARRAY  OF  DATA  VALUES  TO  RE  SORTED  t I NP"T /OUTPUT ) 
N*  THE  NUMBER  OF  DATA  POINTS  (INPUT) 

XHAT  = THE  SAMPLE  MEAN  OF  THE  DATA  PROCFSS  (OUTPUT) 

SHAT  * THE  SAMPLE  STANDARD  DFv I AT  I ON  OF  THr  DATA 
PROCESS  (OUTPUT! 


SUBROUTINE  OPD£R(X) 


COVMON/ONE/  X(2.10C0U  NO! 

COMMON /THREE/  XHAT  ( 3 ) , SHATO) 

NM1=  N ( < ! - 1 
DO  200  1=1 . NV1 
is>l  * I ♦! 

DO  100  J=  IP] , N(K) 

IF!  X ( < , I ) .LE.  Y(X»J)  ) GO  TO  100 
TFMP=  X ( X « I 1 
X(<»I )=  X ( < » J ) 

X f < » J ) * TEMP 
B 100  CONTINUE 
20iJ  CONTINUE 

<:•••«•  compute  the  sample  statistics  por  the  data  proc^s*; 

SUM  10.0 
SUM?  0.0 

DO  ?r.O  1 = 1,  N ( X ! 

SUV1=  SUM 3+  X (X  » I ) 

St'v2=  <;(!«?♦  x (<,!)*•? 

300  CONTINUE 

YN=  N ( X ! 

XHAT(<)=  SUMl/YN 
RN=  YN-  1.0 

SU<y22=  5UM2-  (St.'Ml  «*2  ) /yN 
SKA  T ( X 1 = S0RT(SUM2?/RN) 

S RETURN 

END 


noon  r rinn  nnnAArif\n 
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-REOR.IS  CLASS 

*****  THIS  SUBROUTINE  CALCULATES  THF  MINIMUM  CLASSICAL  rA^PLF  <.1ZC 
P'OUt  RED  rn  CONST  °!  if  T A CONF  J DFNCE  INTEp\/al  OF  GIvcN  vTDrw 
ABOUT  THE  MEAN  OF  A NORMA).  SAMPLING  POPULATION  OF  i)N<WWN 
VARIANCE 

*****  ARGUEMANT  DFF IM  I T I ON 

ALpHA=  THE  CONFIDENCE  COEFFICIENT  (INPUT) 

DELTA=  A FUNCTION  OF  THE  INTERVAL  WIDTH  I INPUT  1 
NCLASS=  THE  COMPUTED  SAMPLE  SIZE  (PUTPt'TI 

SUBROUTINE  CL ASSI NCL ASS  I 

COvmON/FJvE/  ALPHA  « UL 13).  HIM  31 
COMMON/srvE‘l/  ncisi.  dflta 
Cor-«M«*i/TfN/  LO0°(  2 1 * <EY,  DMAX 

•*•**  CO**P*ITF  THF  FIRST  APPROXIMATION  OF  THE  CLASSICAL  cA*'Plf 

SIZE#  NClll,  8ASFD  ON  THE  STANDAPD  NORMAL  DI STR I D"T I ON  « ynTCH 
IS  IDENTICAL  TO  THr  T DISTRIBUTION  VI TH  INFINITE  oCORFFS 
OF  FREEDOM 
ALPHA1 = ALPHA/?. 0 
S*  T I NORAM  Alpha  1 « *1S1 
GO  TO  IB  4 

If  WRITEI6.17I 

17  EorvATI//.  1 ox . 6BH  ERROR  MESSAGE — 0Vepflpu.  ON  jNvrp^F  \*odval  DIct 

1 R I But  I on — format  IS  i 
call  Exit 

IB  CONTINUE 

RCALC*  i2.:>*S/DELTAi**2 
•NC  ( 1 1 * I NT  f RE  ALC  I 

Ir(  NClll  .LT.  REALC  I NClll*  NClll*  1 

*****  COMPUTE  THF  SUCCFFDING  APPROX IMAT IONS  Of  TmF  CLASSIC A|  SAMPLE 
WZE » NCijj,  BASED  ON  THE  T D I STR I Rut  jov  w|T«  nrr.orrs  or 
FREEDOM  EQUAL  to  NCIJ-1I-  1*  STOP  THE  f T F P A T f vr  non'T^rr 
HHE N NIJl  IS  EQO'L  TO  NIJ-ll 
DO  30  J*2.  1C 

NOF*  NCIJ-ll-  1 
T=  STUDINI ALPHA.  NDF.  S211 
GO  TO  2* 

21  WRITEI6.2SI 

23  FORMAT!//,  lOX.  7AM  ERROR  MESSAGE — OVERFLOW  On  sT'iDE^Tf  T DlSTRJHu 
IT  ION  FUNCTION — FORMAT  21  I 

CALL  EXIT 

2 A continue 

RE  ALC  = (2.0»T/DFLTA ) 

NC(J1=  INTlREALCl 

I E ( NC I J I .LT.  REALC  I NCUl  = NCIJJ*  1 
IF  ( NC  I J I .EO.  NCIJ-n  I GO  TO  3* 


8 30  CONTINUE 


c*#***  ASSIGN  THE  COMPUTED  SAMPLE  S5ZL 
35  NCLASS=  NC(J1 
LOOP  ( 1 1=  J 
RETURN 
END 


NCLASS 
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-RFOR  * IS 
C 

c 

c 

c 

c 

c 


?w!!\.ibroutine  calculates  the  minimum  bayesian  sample  size, 

' ,m“  TO  CONSTRUCT  A CONFIDENCE  INTERVAL  OF  OTVEN 

W I OTH  ABOUT  THF  MEAN  OF  A NORMAL  SAMPLING  POPUL AT | ON 
WITH  UNKNOWN  VARIANCE 

ARGUrMANT  definition 

<=  THE  MINIMUM  CLASSICAL  SAMPLE  SIZE  (IN  IT) 

NB Ay£S=  THE  COMPUTED  SAMPLE  SIZE  (OUTPUT) 


SUBROUTINE  BAyES  < K.  NBAyES  ) 

COMMON/ONE/  XC2.IOOU).  N t 3 > 
COMMON/TWO/  XV£ AN  13)*  XVAR ( 3 ) 
COMMON/ T HR tE / XHAT(3)«  SHAT (3) 
COMMON /SEVEN/  NCI  3).  DELTA 
COMVCN/c  I CtHT  / Nb(10)»  NPRIMtlO). 
COMMON /T EM/  LOOPtZJ.  <EY.  DMAX 


DIFF 


C< 

c 


FIRST  APPROXIMATION.  Nil).  OF  THE  BAYFSUN 


) NBU)=  NBt  !>■*•  1 


O 

C 

c 


► COMPUTE  THE 
SAMPLE  SIZF 
REAL3«  FLOATIXI/A.O 
NSIiJ*  I NT ( REALS  ) 

1 E ( NBC1I  .LT.  REAL3 
N ( 2 I = NB ( 1 ) 

* TAX?  N(l)  SAMPLES  and  COMPUTE  the  sample  ST  AT  I ST  I C S F OR  THE 
DATA  PROCESS  AND  THE  POSTERIOR  PARAMETERS  BASED  ON  THESE 
NCI  I OBSERVATIONS 
CALL  ORDER ( 2 ) 

APPN3  SHATC2»*»2/XVAR(1 ) 

NDR IM t 1 ) 3 I NT ( APPN  ) , 

IF(  NPRIM(I)  .LT.  APPN  ) NPRIM(l)'  NPRlM(l)*  ■ 

XM£ AN ( 3 J * t NPRIMCT J *XMEAN C 1 )♦  Nfl  t 1 ) *XHAT I 2 ) )/ 

1 FLOAT  C NPRIMC I )♦  N8  C H ) 

DIFF*  ABSC  XME AN ( 3 1 ~ XHATC2I  > 

UMAX-  DIFF 
<tY=  1 

IFC  N(2>  .OF.  K-  NPRIMCIJ  ) GO  TO  55 




C 


THF  BAYTSIAN 
NMJ)  IS 


c« 

c 

c 


COMPJTF  THE  SUCCEEDING  APPROXIMATIONS.  NtJ),  OF 
SAMPLE  SIZE.  STOP  THE  ITERATIVE  PROCEDURE  WHEN 
GREATER  THAN  OR  EOUAL  TO  K-  NPRIM(J) 

DO  ICO  J = 2 . 2D 

RINC*  FLOAT  C K-  NPRIM(J-l)  I/A.O 
INC3  INTI  P I NC  ) 

IM  INC  .LT.  R I NC  ) INC3  INC*  I 
NO  t J ) = NB ( J-l  ) * INC 
N(2)=  NBCJJ 

, TAKE  fAfH  SUCCEEDING  N(J>  samples  and  compute  the  aavplf 

STA  ISTICS  FOR  THE  DATA  PROCESS  AND  THE  POSTERIOR  PAVAMTIRS 
BASED' ON  THESE  NtJ)  OBSERVATIONS 
CALL  ORDER12) 


o n 
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APPN=  SHAT < 2 ) **2/XVAR 1 1 1 

, npbimij).  np». 

XMEAN (31s  < NPRIM(JJ*XMEAN(1)+  NB ( J ) *XHAT  A ) 
1 FLOAT!  NPRIH(J)+  Nfi(J)  1 
D1FF=  AOS(  XMEAN (3)-  XHAT(2)  1 
l Ft  01 FF  .LE.  OMAX  ) GO  TO  35 
DMAX=  OlFF 
KEY2  J 

35  IfTniIm  .GE.  K-  NPRIMtJ)  ) GO  TO  45 


1 

)/ 


100  CONTINUE 


***«  * 
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45 


assign  the  sample  size 

THE  POSTERIOR  (POOLED) 
CONTINUE 

N3AYES*  N ( 2 1 . „r 

1F(  NOAYFS  .GT.  K ) NBAYES*  ^ 
N(3)=  NBAYES+  NPRl M ( 1) 

XHAT  t A 1 - XMT  AN ( 3 ) 

SHAT ( 3 ) s <HAT(2) 

LOOP (21  = 1 
GO  TO  999 

CONTINUE 

NBAYES*  N9(J)  uolvKi  „ 

IF ( NBAYES  .GT.  < ) NBAYES  < 
N ( 3 ) = NBAYES-*-  NPR1MIJJ 
XHAT (31*  XMEAN (3) 

SHAT (3)*  SHAT  I 2 1 
LOOP  <21—  J 


COMPUTED  ABOVE 

sample  SIZE 


to  n^ayfs  and  determine 


999  RETURN 
END 


rinnrtn  r>  nnnnririnrifmri 
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b 

-RFOR ♦ I S CONE  ID 


***** 


this  subroutine  calculates  a confidence 

OF  A NORMAL  POPULATION  WHEN  THE  VARIACE 


INTERVAL  FOR  the  '•’EAN 
IS  UNKNOWN 


*****  ARGUE^ENT  DEFINITION 

N = THE  NUK3ER  OF  DATA  POINTS  IN  THE  SAMPLE  UmPUT) 
ALPHA*  THE  CONFIDENCE  COEFFICIENT  (INPUT) 

XHAT*  THE  SAMPLE  MEAN  OF  THE  DATA  PROCESS  (IN^UT) 
SHAT*  THE  SAMPLE  STANDARD  DEVIATION  OF  THE  DATA 


PROCESS  (INPUT) 

UL=  THE  LOWER  CONFIDENCE  LIMIT 

FOR 

THF 

MF  AN 

(OUTPUT ) 

UU*  THE  UPPER 

CONE  I CENCF  LIMIT 

FOR 

THE 

mean 

(OUTPUT ) 

SUBROUTINE  CONFID(N, 
COKMCN/THREE/  XHAT (3 

J) 

),  SHAT ( 3 ) 

common/five/  alpha,  ul(3j.  uum 

*••••  COMPUTF  THE  DEGREES  OF  FREEDOM  ASSOCIATFD  WITH  T HP  savplF 
NDF  = N-l 

*****  DETERMINE  THE  VALUE  OF  THE  STUDENT (S  T DISTRIBUTION  AT  A 
SIGNIFICANCE  LEVEL.  = ALPHA 

NOTE — THIS  OPERATION  USES  A STAT*PACT  FUNCTION  CA|LED  srnniN 
TO  CALCULATE  THE  INVERSE  STi'DEnTS  T VALl-r  GIVEN  THE 
CONFIDENCE  COEFFICIENT  ALPHA 

T = STUDINf ALPHA,  NOE,  $10) 

GO  TO  700 

1C  XR I TF  f 6 , 1 S ) 

is  format ( //,iox,  7<.h  error  message — overflow  on  student  is  t distripu 

IT  ION  FUNCTION — FORMAT  700  ,) 

CALL  EXIT 
700  CONTINUE 
Y.N  = N 

C***»*  COMPUTE  THE  LOWER  CONFIDENCE  LIMIT 
ULCJ)*  XHAT(J)-  T*(SHAT(J)/SORT(YN)  ) 

C***»*  COMPUTE  THE  UPPER  CONFIDENCE  LIMIT 
OU(J»=  XHAT(J)+  T*(SHAT( J) /SORTCYN)  1 

RETURN 

fcND 


-RMAP 

L I ft  SYSTEM $*MATMST A T 
-XOT 
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-RFOR.lS  OUTPUT 

SUBROUTINE  OUTPUT 


COMMON/ONE/  X ( 2 *1000 ) , N(3) 

COMMON/TWO/  XMEAN 13),  XVARI3) 

CORiVON/ THREE/  XHATC?),  SHAT  ! 3 1 
COMMON/FIVE/  ALPHA,  IJL  I 3 ) » UUI3) 

COMMON /SEVEN/  NCI  5).  DELTA 
C0MVCN/E1GHT/  NBI  10 ) , NPRIMIIO),  DIFF 
COMMON/NINE/  WIDTH ( 2 ! 

COKMON/TEN/  LOOP! 21*  KEY.  DMAX 

C***«*  PRINT  HEADINGS  FOR  PRINTED  OUTPUT 
DO  100  J«l,  2 
WRITEI6.1*-) 

IS  FORMAT  I 1H1 1 

1 F ( J .EG.  2 1 GO  TO  AO 
WRI TE 1 6 *35 ) 

35  FORMAT!///,  ACX • A5H  DATA  VALUES  USED  IN  THE  CLASSICAL  ANALYSIS  ) 
GO  TO  50 

Ai)  WRITEI6.A5I 

A5  FORMAT!///.  AUX.  ash  DATA  VALUES  USED  IN  THE  BAYESIAN  ANALYSIS  I 

C * * ***  PRINT  BASIC  PARAMETERS  ASSOCIATED  WITH  r ACH  DATA  PRDCFSS 

50  CONTINUE 

WR I T E I 6 ,52 ) N(J1 

52  FORMAT!///,  lOX*  26H  NUMBER  OF  OBSERVATIONS  = , I'M 

wR I TF ( 6 ,5A ) XVEAN12) 

5A  FORMAT  <1CX»  1RH  LUFLIHOOO  MEAN  = , F8.B) 

» IF!  J ,E0,  21  WPITE16.56)  XMEAN ! 1 ) 

56  FORMAT  (1H*.,  T82*  1 AH  PRIOR  MEAN  = , FS.3) 

wRl TE ! 6*58  ) XVARI2) 

58  FORMATUUX,  23H  LIVELIHOOD  VARIANCE  = * Ffl.3) 

IF!  J ,EQ,  2)  WRITE!  6*60)  XVARU) 

6J  FORMAT  l 1 H-* , T82*  18H  PRIOR  VARIANCE  = * F8.3) 

WRITF16.621  DELTA 

62  FORMATUUX*  9H  DELTA  = , FA. 2) 

PRINT  THE  DATA  VALUFS  GENERATED  BY  RANDN 
WRITE  16,65)  ( XIJ.I),  1 = 1,  NIJ)  1 

65  FORMAT  t / / / . 10(3x,  F8.3)  1 

PRINT  THE  SAMPLE  STATISTICS  OF  THE  DATA  PROCESS 
I F ( J ,E(J.  2 1 WR  I TE  ( 6 , 72  ) XHATl?) 

72  FORMAT!///,  1 LX  , A7H  THE  MCAN  OF  THE  POSTERIOR  D I cjr  j p. . T I , M — = 
1 , flu.1') 

■~RI  T ‘ ( 6,76  1 XHATIJI,  SHAUJJ 

75  fOR-AT!///,  IPX,  A6H  THE  SAMPLE  ME  AN  Of  T H f i)  AT  A PR  nr'  v(|AT  - , 

1 f ! .6,  ///,  1 IX,  R-5U  THE  SAMPLI  STANDARD  nrvl  ATIPM  r!‘  Ti  »f  D A T A P 
2 ROC:  :>S , SHAT  = « FIO.5  ) 
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<-*«***  PRINT  THE  (1-ALPHA)  CONFIDENCE  INTERVAL  ASSOCIATE"  WITH  FACH 
C PROCESS 

W I DTH ( J ) = OFLTA*SHAT( J) 

KHITEC6»85)  WIDTH(J) 

65  FORMAT!//,  1GX,  52H  THE  DESIRED  WIDTH  OF  THE  CONFIDENCE  INTERVAL  I 
IS  = ♦ F6.2  ) 

L'L(2I-  UL  ( 3 ) 

UU(2!=  UU ( 3 ) 


WRITE(6,9S)  ALPHA,  UL ( J 1 , UU(J) 

95  FOR'-iAT  < //,  lOy,  4 7h  THF  (l-ALPHA)  CONFIDENCE  INTFRV&L  FOP  TwF  m£an 
1 , /»  1GX.  38H  WITH  CONFIDENCE  COEFFICIENT,  ALPHA  = , F4.3, 

7 8H,  IS  = ( , F8.3,  2H,  , Ffi,3,  1H)  ) 

I F ( J .£0.  1)  GO  TO  97 

*RITE(6»98)  DIFF 

98  FORvATt//,  10x,  71H  THE  ABSOLUTE  DIFFERENCE  RE TV'EEN  THE  POSTERIOR 

1 AND  SAMPLE  means,  DIFF  = , F6.3> 

'a’R  I TE  I 6 * 99  ) DMAX,  KEY 

99  FORMAT!//,  lCx,  7H  OMAx  = , F6.3,  10H  AT  LOOP  = ,12) 

97  CONTINUE 

WRITE  I 6,96)  LOOP ( J ) 

8 96  FORMAT (//,  lnx,  9H  LOOPS  * ,12) 

lu  CONTINUE 

RETURN 

END 


6] 


APPENDIX  II 

FORTRAN  PROGRAM  FOR  THE  CHI-SQUARE 
TEST  OF  NORMALITY 


nonr>of>nnr\onnr\nnnn 
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-RrOR»lN  CHiSO 

C ****  * THIS  SUBROUTINE  TAKES  A SET  or  ORDERED  DATA  ( ARK  A»*GFD 

THE  LOWEST  TO  the  HIGHEST  VALUE!  AND 

(1)  ESTABLISHES  < LOUAL-PRORABIL I Tv  CELLS,  ••’HE»E  < DEPENDS  DN 
The  SAMPLE  SIZE,  I.E.,  <=  20  Fhr  K .GE.  1DP*  <=  1 « F0<?  '!  ,GE. 

5n  .AND,  )L7.  1-0,  AND  <=  3 EOR  N .LT,  5" 

(2)  PEPEpRMS  A CHI-SOi'ARE  GOODNESS-OF  -FIT  TEST  Ff'P  *!^P”ALITY 
0 M THE  DATA  SAMPLE  AND  DETERMINES  THE  SIGNIFICANCE  LE'/El 
AT  WHICH  >E  CAN  ASSUME  THAT  THE  DATA  SAMPLE  IS  IN  FAOt 
REPRESENTATIVE  OF  A NORMAL  PPOCCSS 

* ARGUEVFKT  DEFINITION 

X * THF  ARRAY  OF  DATA  VALUES  TO  BE  TESTED  (INPUT) 

N=  THE  NUMBER  OF  DATA  PIONTS  (INPUT! 

<=  TnE  NUMBER  OF  CELLS  INTO  WHICH  THE  DATA  IS  MVIDH.'  (INPUT 
XHAT  = THE  SAMPLE  yean  OF  THF  DATA  PROCESS 
SHAT  = THE  SAMPLE  STANDARD  DEVIATION  OF  THE  DATA  PROCSS 
C H 1 5 * Tup  CHI-SQUARE  STATISTIC  COMPUTED  FRO" 

THE  DATA  (OUTPUT) 

SIGL=  The  SIGNIFICANCE  LEVEL  OF  the  TrGT  (OMTP-TI 


subroutine  chi  so 

CO.VMON/ONE / X(SOO),  N 

COVV.ON/ TWO/  xmean,  xvar 

COMMON /FOUR/  K,  SLcSSl,  CHIS,  SIGL 

COMMCN/S I X/  CBSTRO (19),  CHN0RMI19),  K0UNT(2D) 

DIMENSION  ALPHA ( 19  J 

C *****  set  ALL  CFLL  COUNTERS  TO  7ER0 

DC  5 1=1,  K 
<C»NT  (11=  n 
S CONTINUE 

C*****  COMPUTE  THE  CELL-BREAK  POINTS  FOR  THE  G I VrN  DATA 
C NOTE  — THIS  OPERATION  USES  A STAT-PACT  FUNCTION  C A i L r "■  TIMOR" 

C 7 C COMPUTE  The  VALUE  OF  THE  INVERSE  OF  THE  NORMAL  ('',1!  D I S 7 R . 

I F ( <-  1 .: ! !•!  » 2^,  30 
i~»  00  15  1*1,  KLFSS1 
AlPhA ( I ) = 0.2*1 
15  CON 7 I Ni'E 
GO  TC  5. 

20  DO  25  1=1,  XLESS1 

ALPHA! I )=  o.]*i 
25  CONTINUE 
gO  TO  50 

31  DC  35  1=1,  KLESS1 

ALPHA!  I )=  .'1.05*1 
35  CONTINUE 
A J NO  1.1  1 = 1,  KLF5S1 

Ct>3TO(l)  = T I NORM  ( ALPHA  { I 1 , S7_>) 

C-nORV(I)*  COS T K'l  ( I ) BSOR T ( XVAR  )♦  X"!  - ?' 

GO  TO  !«C 

7 ' - I TE  ( 6 ,°.' ) 

0.1  f DA  :AT  ( //,  1 .;x,60M  : RRf.V  ;'r  '.r  A«  ■'  — .'V!  "'f  l '•  *'  I ~ L V 

I7RiH-.T5.jN  — KCRMAT  1-K  I 


1.  CONTINUL 


r\  r\ 
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C****»  CCG'N 7 THE  .NUMBER  Or  OBSERVATIONS  FALLING  IN  EACH  rcLL 

DO  3'  1 = 1 , N 

2 JJ  J = 1 . KL.ESS1 

lc(  X C I > .GT.  C fjMORV  ( J ) ) GO  TO  190 
COUNT (J)  = <C1N 7 f J ) + ? 

GO  TO 

19;  JF(  J ,C(J.  KLFSS1  ) COUNT (K}s  KOUNTIO+l 

2~>  > C^NTI.v  it 

m:j  CONTI 's;r 

C-v.-oiTE  THE  CHI-SOUARE  STATISTIC*  CHIS 
C—  I i = . ’ • C 

R\’  = FLOATf  M )/  FLOAT!  K ) 

DC  *>-■»  1 = 1.  G 

C H I 1 = Ch  I 1 + { COUNT  ( l ) -RN ) **2 
I'D.  CON  T i 'I'  '* 

CHI<;=  OII1/RN 

C*3***  L«c  re  VC  the  SIGNIFICANCE  LrVEL  OF  THE  TFST 

NOTE  — THIS  OPERATI'IN  "ScS  A STAT-PACT  pi'MCTJON  CA!.  LFO  CHT  to 
DETERMINE  THE  CHI-SOUARE  DISTRIBUTION  GIVEN  THE  P^INT  AND 
THE  Dt"-R1ES  OF  FRF EDO” 

NDF  = <~S 

C'JMO*  CHlt  CHIS*  NOF  « T600  1 
S lOL s l.D-  CtJVC 
GO  TO  69.) 

6.-  - i-.R  I I-  IS.6K  ) CHIS 

M;  'ORVAH // , 1 OX  . 79H  ERROR  MESSAGE — OVERFLO'-'  ON  CHl-S0"ARE  DISTRUST 
I ION  FUNCTION — FORMAT  6<  *2AH  CHI-SOUARE  STATISTIC  = . FS.2) 


o ) 


MS  r ‘ . A *. 
£.  VJ 
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